A Ground Truth Bleed-Through Document Image Database

نویسندگان

  • Róisín Rowley-Brooke
  • François Pitié
  • Anil C. Kokaram
چکیده

This paper introduces a new database of 25 recto/verso image pairs from documents suffering from bleed-through degradation, together with manually created foreground text masks. The structure and creation of the database is described, and three bleed-through restoration methods are compared in two ways; visually, and quantitatively using the ground truth masks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document perceptual quality ground truth creation

This article focuses on a new method for document perceptual quality ground truth creation. This type of ground truth gives a quality related score to each image in a dataset. This is useful for performance evaluation of algorithms that measure the quality of images. The quality of a document image is related to the amount of its degradations. To our knowledge, a methodology to create this kind...

متن کامل

Document Image Binarization

Principal stage of the document image analysis procedure is the binarization, according to which the pixels are classified into text and background. It is a crucial stage that can affect further stages including the final character recognition stage. This thesis is focused on document image binarization, including both binarization techniques and evaluation methodologies. Specifically, accordin...

متن کامل

Automatic Assessment of OCR Quality in Historical Documents

Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspon...

متن کامل

Objective Quality Measurement for Geometric Document Image Restoration

Many algorithms to remove distortion from document images have be proposed in recent years, but so far there is no reliable method for comparing their performance. In this paper we propose a collection of methods to measure the quality of such restoration algorithms for document image which show a non-linear distortion due to perspective or page curl. For the result from these measurement to be...

متن کامل

Performance Evaluation of Document Structure Extraction Algorithms

This paper presents a performance metric for the document structure extraction algorithms by finding the correspondences between detected entities and ground truth. We describe a method for determining an algorithm’s optimal tuning parameters. We evaluate a group of document layout analysis algorithms on 1600 images from the UW-III Document Image Database, and the quantitative performance measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012